ESM 244 2024
Bren School of Environmental Science
2024-02-12
Extract information using [] and [[]]
purrr has useful functions to help us deal with lists
ggplots and all stats models in R are stored as lists
Computers do thousands to millions of tasks, really fast
For loops are useful tools, but have limitations
Doesn’t integrate into tidyverse very well
Difficult to interpret
Nesting is confusing
One error breaks the whole process
Can be slow if not constructed properly
R is a functional (i.e we use functions to get stuff done) language, why not use a functional iterative process?
Because the first element is data or information, map works really well in pipes
|
|
|
|
|
|
|
|
|
|
|
Map returns the results as a list
Imagine you are tasked with running regressions over subsets of data each with different regressions specifications how would you do it?
Map can apply regressions to any number of subsets
Only 3 cylinders in the dataset, but there could have been 1,000 and the code above would store 1,000 regression models
pmap lets us put in any number of inputs as a list
What am I trying to accomplish?
How is my data currently stored?
How will my data be passed to map?
Build (or use) a function that accepts everything
I’ve used map to iterate bioeconomic models over 100,000s of parameter combinations, and run machine learning algorithms on 1000s of datasets
Beyond this class you will encounter big datasets, purrr is a great way to handle enormous data.
Won’t break the entire process so you have to start again. Nothing hurts more than running code for hours to get an error at the end.
You know what models ran into errors in the future
safely() and possibly() store errors and allow you to return errors respectively
# Make up some data
dat = structure(list(group = c("a", "a", "a", "a", "a", "a", "b", "b", "b"),
x = c("A", "A", "A", "B", "B", "B", "A", "A", "A"),
y = c(10.9, 11.1, 10.5, 9.7, 10.5, 10.9, 13, 9.9, 10.3)),
class = "data.frame", row.names = c(NA, -9L))
#Define safe lm function
safelm=safely(.f=lm)
dat %>%
split(dat$group) %>%
map(~safelm(y~x,data=.x)) %>%
map("error") # Pull out errors$a
NULL
$b
<simpleError in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels>
The furrr package allows you to access your computer or server’s multiple cores
Speeds everything up exponentially
Works just like purrr map, but sends blocks of data to your computers other cores
Tired of waiting for code to run and you have no idea how long it will take?
purrr comes with built in progress bars to see how long its taking